arXiv: 1502.04281 vl [cs.DC] 15 Feb 2015 


Frog Wild! - Fast PageRank Approximations 

on Graph Engines 


loannis Mitliagkas 
ECE, UT Austin 

ioannis@utexas.edu 


Michael Borokhovich 

ECE, UT Austin 

michaelbor@utexas.edu 


Alexandres G. Dimakis Constantine Caramanis 

ECE, UT Austin ECE, UT Austin 

dimakis@austin.utexas.edu constantine@utexas.edu 


ABSTRACT 

We propose Frog Wild, a novel algorithm for fast approxi¬ 
mation of high PageRank vertices, geared towards reducing 
network costs of running traditional PageRank algorithms. 
Our algorithm can be seen as a quantized version of power 
iteration that performs multiple parallel random walks over 
a directed graph. One important innovation is that we in¬ 
troduce a modification to the GraphLab framework that 
only partially synchronizes mirror vertices. This partial 
synchronization vastly reduces the network traffic generated 
by traditional PageRank algorithms, thus greatly reducing 
the per-iteration cost of PageRank. On the other hand, 
this partial synchronization also creates dependencies be¬ 
tween the random walks used to estimate PageRank. Our 
main theoretical innovation is the analysis of the correla¬ 
tions introduced by this partial synchronization process and 
a bound establishing that our approximation is close to the 
true PageRank vector. 

We implement our algorithm in GraphLab and compare 
it against the default PageRank implementation. We show 
that our algorithm is very fast, performing each iteration in 
less than one second on the Twitter graph and can be up to 
7x faster compared to the standard GraphLab PageRank 
implementation. 


1. INTRODUCTION 

Large-scale graph processing is becoming increasingly im¬ 
portant for the analysis of data from social networks, web 
pages, bioinformatics and recommendation systems. Graph 
algorithms are difficult to implement in distributed com¬ 
putation frameworks like Hadoop MapReduce and Spark. 
For this reason several in-memory graph engines like Pregel, 
Giraph, GraphLab and GraphX [24[ |23[ |35[ |31| are being 
developed. There is no full consensus on the fundamental 
abstractions of graph processing frameworks but certain pat¬ 
terns such as vertex programming and the Bulk Synchronous 
Parallel (BSP) framework seem to be increasingly popular. 

PageRank computation [27| , which gives an estimate of 
the importance of each vertex in the graph, is a core compo¬ 
nent of many search routines; more generally, it represents, 
de facto, one of the canonical tasks performed using such 
graph processing frameworks. Indeed, while important in 
its own right, it also represents the memory, computation 
and communication challenges to be overcome in large scale 
iterative graph algorithms. 


In this paper we propose a novel algorithm for fast approx¬ 
imate calculation of high PageRank vertices. Note that even 
though most previous works calculate the complete PageR¬ 
ank vector (of length in the millions or billions), in many 
graph analytics scenarios a user wants a quick estimation of 
the most important or relevant nodes - distinguishing the 
10^^ most relevant node from the 1,000*^ most relevant is 
important; the 1,000, 000*^ from the 1, 001, 000*^ much less 
so. A simple solution is to run the standard PageRank algo¬ 
rithm for fewer iterations (or with an increased tolerance). 
While certainly incurring less overall cost, the per-iteration 
cost remains the same; more generally, the question remains 
whether there is a more efficient way to approximately re¬ 
cover the heaviest PageRank vertices. 

There are many real life applications that may benefit 
from a fast top-k PageRank algorithm. One example is grow¬ 
ing loyalty of influential customers [^. In this application, 
a telecom company identifies the top-k influential customers 
using the top-k PageRank on the customers’ activity (e.g., 
calls) graph. Then, the company invests its limited bud¬ 
get on improving user experience for these top-k customers, 
since they are most important for building good reputation. 
Another interesting example is an application of PageRank 
for finding keywords and key sentences in a given text. In 
[25| , the authors show that PageRank performs better than 
known machine learning techniques for keyword extraction. 
Each unique word (noun, verb or an adjective) is regarded 
as a vertex and there is an edge between two words if they 
occur in close proximity in the text. Using approximate top- 
k PageRank, we can identify the top-k keywords much faster 
than obtaining the full ranking. When keyword extraction is 
used by time sensitive applications or for an ongoing analysis 
of a large number of documents, speed becomes a crucial fac¬ 
tor. The last example we describe here is the application of 
PageRank for online social networks (OSN). It is important 
in the context of OSNs to be able to predict which users will 
remain active in the network for a long time. Such key users 
play a decisive role in developing effective advertising strate¬ 
gies and sophisticated customer loyalty programs, both vital 
for generating revenue [^. Moreover, the remaining users 
can be leveraged, for instance for targeted marketing or pre¬ 
mium services. It is shown in 19 that PageRank is a much 
more efficient predictive measure than other centrality mea¬ 
sures. The main innovation of 19 is the usage of a mixture 
of connectivity and activity graphs for PageRank calcula¬ 
tion. Since these graphs are highly dynamic (especially the 
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user activity graph), PageRank should be recalculated con¬ 
stantly. Moreover, the key users constitute only a small 
fraction of the total number of users, thus, a fast approxi¬ 
mation for the top-PageRank nodes constitutes a desirable 
alternative to the exact solution. 

In this paper we address this problem. Our algorithm 
(called FrogWild for reasons that will become subsequently 
apparent) significantly outperforms the simple reduced iter¬ 
ations heuristic in terms of running time, network communi¬ 
cation and scalability. We note that, naturally, we compare 
our algorithm and reduced-iteration-PageRank within the 
same framework: we implemented our algorithm in GraphLab 
PowerGraph and compare it against the built-in PageRank 
implementation. A key part of our contribution also involves 
the proposal of what appears to be simply a technically mi¬ 
nor modification within the GraphLab framework, but nev¬ 
ertheless results in significant network-traffic savings, and 
we believe may nevertheless be of more general interest be¬ 
yond PageRank computations. 

Contributions: We consider the problem of fast and 
efficient (in the sense of time, computation and communica¬ 
tion costs) computation of the high PageRank nodes, using 
a graph engine. To accomplish this we propose and ana¬ 
lyze an new PageRank algorithm specifically designed for 
the graph engine framework, and, significantly, we propose 
a modification of the standard primitives of the graph en¬ 
gine framework (specifically, GraphLab PowerGraph), that 
enables significant network savings. We explain in further 
detail both our objectives, and our key innovations. 

Rather than seek to recover the full PageRank vector, we 
aim for the top k PageRank vertices (where k is considered 
to be approximately in the order of 10 — 1000). Given an 
output of a list of k vertices, we define two natural accuracy 
metrics that compare the true top-fc list with our output. 
The algorithm we propose, FrogWild operates by start¬ 
ing a small (sublinear in the number of vertices n) number 
of random walkers (frogs) that jump randomly on the di¬ 
rected graph. The random walk interpretation of PageRank 
enables the frogs to jump to a completely random vertex 
(teleport) with some constant probability (set to 0.15 in our 
experiments, following standard convention). After we al¬ 
low the frogs to jump for time equal to the mixing time of 
this non-reversible Markov chain, their positions are sam¬ 
pled from the invariant distribution tt which is normalized 
PageRank. The standard PageRank iteration can be seen as 
the continuous limit of this process (i.e., the frogs become 
water), which is equivalent to power iteration for stochastic 
matrices. 

The main algorithmic contributions of this paper are com¬ 
prised of the following three innovations. First, we argue 
that discrete frogs (a quantized form of power iteration) is 
significantly better for distributed computation when one is 
interested only in the large entries of the eigenvector tt. This 
is because each frog produces an independent sample from 
TT. If some entries of tt are substantially larger and we only 
want to determine those, a small number of independent 
samples suffices. We make this formal using standard Gher- 
noff bounds (see also [M 14 for similar arguments). On the 
contrary, during standard PageRank iterations, vertices pass 
messages to all their out-neighbors since a non-zero amount 
of water must be transferred. This tremendously increases 
the network bandwidth especially when the graph engine is 
over a cluster with many machines. 


One major issue with simulating discrete frogs in a graph 
engine is teleportations. Graph frameworks partition ver¬ 
tices to physical nodes and restrict communication on the 
edges of the underlying graph. Global random jumps would 
create dense messaging patterns that would increase com¬ 
munication. Our second innovation is a way of obtaining 
an identical sampling behavior without teleportations. We 
achieve this by initiating the frogs at uniformly random posi¬ 
tions and having them perform random walks for a life span 
that follows a geometric random variable. The geometric 
probability distribution depends on the teleportation prob¬ 
ability and can be calculated explicitly. 

Our third innovation involves a simple proposed modifica¬ 
tion for graph frameworks. Most modern graph engines (like 
GraphLab PowerGraph [^) employ vertex-cuts as opposed 
to edge-cuts. This means that each vertex of the graph is 
assigned to multiple machines so that graph edges see a local 
vertex mirror. One copy is assigned to be the master and 
maintains the master version of vertex data while remaining 
replicas are mirrors that maintain local cached read-only 
copies of the data. Ghanges to the vertex data are made to 
the master and then replicated to all mirrors at the next syn¬ 
chronization barrier. This architecture is highly suitable for 
graphs with high-degree vertices (as most real-world graphs 
are) but has one limitation when used for a few random 
walks: imagine that vertex vi contains one frog that wants 
to jump to V 2 . If vertex vi has very high degree, it is very 
likely that multiple replicas of that vertex exist, possibly 
one in each machine in the cluster. In an edge-cut scenario 
only one message would travel from ui —^ U 2 , assuming ui 
and V 2 are located in different physical nodes. However, 
when vertex-cuts are used, the state of Ui is updated (i.e., 
contains no frogs now) and this needs to be communicated 
to all mirrors. It is therefore possible that a single random 
walk can create a number of messages equal to the number 
of machines in the cluster. 

We modify PowerGraph to expose a scalar parameter ps 
per vertex. By default, when the framework is running, in 
each super-step all masters synchronize their programs and 
vertex data with their mirrors. Our modification is that 
for each mirror we flip an independent coin and synchronize 
with probability ps. Note that when the master does not 
synchronize the vertex program with a replica, that replica 
will not be active during that super-step. Therefore, we can 
avoid the communication and CPU execution by performing 
limited synchronization in a randomized way. 

FrogWild is therefore executed asynchronously but re¬ 
lies on the Bulk Synchronous execution mode of PowerGraph 
with the additional simple randomization we explained. The 
name of our algorithm is inspired by HogWild [29| , a lock- 
free asynchronous stochastic gradient descent algorithm pro¬ 
posed by Niu et al. We note that PowerGraph does support 
an asynchronous execution mode |17| but we implemented 
our algorithm by a small modihcation of synchronous execu¬ 
tion. As discussed in [17| , the design of asynchronous graph 
algorithms is highly nontrivial and involves locking proto¬ 
cols and other complications. Our suggestion is that for the 
specihc problem of simulating multiple random walks on a 
graph, simply randomizing synchronization can give signifi¬ 
cant benefits while keeping design simple. 

While the parameter Ps clearly has the power to signifi¬ 
cantly reduce network traffic - and indeed, this is precisely 
born out by our empirical results - it comes at a cost: the 


2 




standard analysis of the Power Method iteration no longer 
applies. The main challenge that arises is the theoretical 
analysis of the FrogWild algorithm. The model is that 
each vertex is separated across machines and each connec¬ 
tion between two vertex copies is present with probability ps- 
A single frog performing a random walk on this new graph 
defines a new Markov Chain and this can be easily designed 
to have the same invariant distribution rr equal to normal¬ 
ized PageRank. The complication is that the trajectories 
of frogs are no longer independent if two frogs are in ver¬ 
tex vi and (say) only one mirror nj synchronizes, both frogs 
will need to jump through edges connected with that par¬ 
ticular mirror. Worse still, this correlation effect increases, 
the more we seek to improve network traffic by further de¬ 
creasing Ps. Therefore, it is no longer true that one obtains 
independent samples from the invariant distribution tt. Our 
theoretical contribution is the development of an analytical 
bound that shows that these dependent random walks still 
can be used to obtain n that is provably close to tt with high 
probability. We rely on a coupling argument combined with 
an analysis of pairwise intersection probabilities for random 
walks on graphs. In our convergence analysis we use the 
contrast bound 12 for non-reversible chains. 


1.1 Notation 


right eigenvector of Q. That is, tt = vi{Q). By the Perron- 
Frobenius theorem, the corresponding eigenvalue is 1. This 
implies the fixed-point characterization of the PageRank vec¬ 
tor, TT = QtT. 


The PageRank vector assigns high values to important 
nodes. Intuitively, important nodes have many important 
predecessors (other nodes that point to them). This recur¬ 
sive definition is what makes PageRank robust to manipula¬ 
tion, but also expensive to compute. It can be recovered by 
exact eigendecomposition of Q, but at real problem scales 
this is prohibitively expensive. In practice, engineers often 
use a few iterations of the power method to get a ’’good- 
enough” approximation. 

The definition of PageRank hinges on the left-stochastic 
matrix Q, suggesting a connection to Markov chains. In¬ 
deed, this connection is well documented and studied 
16 . An important property of PageRank from its random 


walk characterization, is the fact that tt is the invariant dis¬ 
tribution for a Markov chain with dynamics described by 
Q. A non-zero pr, also called the teleportation probability, 
introduces a uniform component to the PageRank vector tt. 
We see in our analysis that this implies ergodicity and faster 
mixing for the random walk. 


Lowercase letters denote scalars or vectors. Uppercase 
letters denote matrices. The {i,j) element of a matrix A 
is Aij. We denote the transpose of a matrix A by A'. For 
a time-varying vector x, we denote its value at time t by 
When not otherwise specihed, || 2 :|| denotes the f 2 -norm 
of vector x. We use for the probability simplex in 

n dimensions, and and d € for the indicator vector 

for item i. For example, ei = [1,0, ...0]. For the set of all 
integers from 1 to n we write [n]. 


2.1.1 Top PageRank Elements 
Given the true PageRank vector, tt and an estimate v 
given by an approximate PageRank algorithm, we define 
the top-fc accuracy using two metrics. 

Definition 2 (Mass Captured). Given a distribution 
V € A”“^, the true PageRank distribution tt G A"”^ and an 
integer k > 0, we define the mass captured by v as follows. 


2. PROBLEM AND MAIN RESULTS 


Pk{v) = 7r(argmax|s|^j,u(5')) 


We now make precise the intuition and outline given in 
the introduction. We hrst dehne the problem, giving the 
definition of PageRank, the PageRank vector, and therefore 
its top elements. We then define the algorithm, and finally 
state our main analytical results. 

2.1 Problem Formulation 

Consider a directed graph G = {V, E) with n vertices 
(jUj = n) and let A denote its adjacency matrix. That is, 
Aij = 1 if there is an edge from j toi. Otherwise, the value is 
0. Let dout{j) denote the number of successors (out-degree) 
of vertex j in the graph. We assume that all nodes have 
at least one successor, dout{j) > 0. Then we can define the 
transition probability matrix P as follows: 

Pij = Aij / doutU). ( 1 ) 

The matrix is left-stochastic, which means that each of its 
rows sums to 1. We call G{V,E) the original graph, as op¬ 
posed to the PageRank graph which includes a probability 
of transitioning to any given vertex. We now define this 
transition probability matrix, and the PageRank vector. 

Definition 1 (PageRank |27| ). Gonsider the matrix 

Q={1 -PT)P+PT-Inxn. 

n 

where pr € [0,1] is a parameter, most commonly set to 0.15. 
The PageRank vector tt G A”“^ is defined as the principal 


For a set S G [n], v{S) = X^iggr’(*) denotes the total mass 
ascribed to the set by the distribution v G A"“^. 

Put simply, the set S* that gets the most mass according 
to V out of all sets of size k, is evaluated according to tt and 
that gives us our metric. It is maximized by tt itself, i.e. the 
optimal value is 

The second metric we use is the exact identification prob¬ 
ability, i.e. the fraction of elements in the output list that 
are also in the true top-fc list. Note that the second metric is 
limited in that it does not give partial credit for high PageR¬ 
ank vertices that were not in the top-fc list. In our experi¬ 
ments in Section]^ we mostly use the normalized captured 
mass accuracy metric but also report the exact identification 
probability for some cases - typically the results are similar. 

We subsequently describe our algorithm. We attempt to 
approximate the heaviest elements of the invariant distribu¬ 
tion of a Markov Chain, by simultaneously performing mul¬ 
tiple random walks on the graph. The main modification to 
PowerCraph, is the exposure of a parameter, ps, that con¬ 
trols the probability that a given master node synchronizes 
with any one of its mirrors. Per step, this leads to a propor¬ 
tional reduction in network traffic. The main contribution 
of this paper is to show that we get results of comparable 
or improved accuracy, while maintaining this network traffic 
advantage. We demonstrate this empirically in Section]^ 
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2.2 Algorithm 

During setup, the graph is partitioned using GraphLab’s 
default ingress algorithm. At this point each one of N frogs 
is born on a vertex chosen uniformly at random. Each vertex 
i carries a counter initially set to 0 and denoted by c{i). 
Scheduled vertices execute the following program. 

Incoming frogs from previously executed vertex programs, 
are collected by the initO function. At apply () every frog 
dies with probability pr = 0.15. This, along with a uniform 
starting position, effectively simulates the 15% uniform com¬ 
ponent from DeHnition[2 

A crucial part of our algorithm is the change in synchro¬ 
nization behaviour. The <sync> step only synchronizes a 
Ps fraction of mirrors leading to commensurate gains in net¬ 
work traffic (cf. Section]^. This patch on the GraphLab 
codebase was only a few lines of code. Section contains 
more details regarding the implementation. 

The scatter 0 phase is only executed for edges e incident 
to a mirror of i that has been synchronized. Those edges 
draw a binomial number of frogs to send to their other end¬ 
point. The rest of the edges perform no computation. The 
frogs sent to vertex j at the last step will be collected at the 
initO step when j executes. 


denoted by A, is geometrically distributed with parameter 
Pt- This follows from the time-reversibility in the telepor¬ 
tation process: inter-teleportation times are geometrically 
distributed, so as long as the first teleportation event hap¬ 
pens before the stopping time, then X ~ Geom(pT). 

This establishes that, the FrogWild! process - where a 
frog performs a geometrically distributed number of steps 
following the original transition matrix P - closely mimics 
a random walk that follows the adjusted transition matrix, 
Q. In practice, we stop the process after t steps to get a 
good approximation. To show our main result. Theorem 
we analyze the latter process. 

Using a binomial distribution to independently generate 
the number of frogs in the scatterO phase closely mod¬ 
els the effect of random walks. The marginal distributions 
are correct, and the number of frogs, that did not die dur¬ 
ing the apply() step, is preserved in expectation. For our 
implementation we resort to a more efficient approach. As¬ 
suming K(i) frogs survived the apply 0 step, and M mirrors 
where picked for synchronization, then we send [frogs 
to M) mirrors. If the number of available frogs is 

less than the number of synchronized mirrors, we pick K{i) 
arbitrarily. 


FrogWild! vertex program 
Input parameters: ps, Pt = 0.15, t 
apply(i) K{i) <r- [# incoming frogs] 

If t steps have been performed, c{i) <— c(i) -j- K(i) and halt. 
For every incoming frog: 

With probability px, frog dies: 
c{i) -f— c(i) -I- 1, 

K{i) <- K{i) - 1. 

<sync> For every mirror m of vertex i: 

With probability 

Synchronize state with mirror m. 
scatter(e = {i,j)) [Only on synchronized mirrors] 

Generate Binomial number of frogs: 

X ~ Bin (K(i), -—— ) 

\ dout{i)Ps/ 

Send X frogs to vertex j: signal (j ,x) 


Parameter px is the teleportation probability from the 
random surfer model in To get PageRank using ran¬ 

dom walks, one could adjust the transition matrix P as de¬ 
scribed in Definition to get the matrix Q. Alternatively, 
the process can be replicated by a random walk following 
the original matrix P, and teleporting at every time, with 
probability px- The destination for this teleportation is cho¬ 
sen uniformly at random from [n]. We are interested in the 
position of a walk at a predetermined point in time as that 
would give us a sample from n. This holds as long as we 
allow enough time for mixing to occur. 

Due to the inherent markovianity in this process, one 
could just consider it starting from the last teleportation 
before the predetermined stopping time. When the mixing 
time is large enough, the number of steps performed between 
the last teleportation and the predetermined stopping time. 


2.3 Main Result 

Our analytical results essentially provide a high probabil¬ 
ity guarantee that our algorithm produces a solution that 
approximates well the PageRank vector. Recall that the 
main modification of our algorithm involves randomizing the 
synchronization between master nodes and mirrors. For our 
analysis, we introduce a broad model to deal with partial 
synchronization, in Appendix [X] 

Our results tell us that partial synchronization does not 
change the distribution of a single random walk. To make 
this and our other results clear, we need the simple defini¬ 
tion. 


Definition 3. We denote the state of random walk i at 
its step by sj. 

Then, we see that = i|si = j) = 1/doutO), and 

^t+i _ This follows simply by the symmetry assumed 
in Definition Thus if we were to sample in serial, the 
modihcation of the algorithm controlling (limiting) synchro¬ 
nization would not affect each sample, and hence would not 
affect our estimate of the invariant distribution. However, 
we start multiple (all) random walks simultaneously. In this 
setting, the fundamental analytical challenge stems from the 
fact that any set of random walks with intersection are now 
correlated. The key to our result is that we can control the 
effect of this correlation, as a function the parameter ps and 
the pairwise probability that two random walks intersect. We 
dehne this formally. 

Definition 4. Suppose two walkers h and I 2 start at the 
same time and perform t steps. The probability that they 
meet is defined as follows. 

Pn{t) = P(3 T £ [0,t], s.t. sj^ = sCJ (2) 


Definition 5 (Estimator). Given the positions of N 
random walks at time t, we define the following 

estimator for the invariant distribution tt. 


. . |{i:Z£ [iV],s( = i}| 

7r^(*) = - - - 


c(») 

N 


( 3 ) 
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Here c{i) refers to the tally maintained by the FrogWild/ 
vertex program. 


Now we can state the main result. Here we give a guaran¬ 
tee for the quality of the solution furnished by our algorithm. 


Theorem 1 (Main Theorem). C onsi der N frogs fol¬ 
lowing the FrogWild! process (Section \2.i!^ , under the era¬ 
sure model of Definition The frogs start at independent 
locations, distributed uniformly and stop after a geometric 
number of steps or, at most, t steps. The estimator Hn 
(D efinition^, captures mass close to the optimal. Specifi¬ 
cally, with probability at least 1 — 5, 

pik{T^N) > /ife(7r) — e, 


where 


e < 


(1 -Pt)‘+^ 
Pt 


+ t/^ 


+ (1 -pi)Pn(t) 


( 4 ) 


Remark 6 (Scaling). The result in Theorem^imme- 
diately implies the following scaling for the number of itera¬ 
tions and frogs respectively. They both depend on the max¬ 
imum captured mass possible, y.ki'^) a,nd are sufficient for 
making the error, e, of the same order as p,k (tt) . 

The proof of Theorem is deferred to Appendix |B.1[ 
The guaranteed accuracy via this result also depends on 
the probability that two walkers will intersect. Via a simple 
argument, that probability is the same as the meeting prob¬ 
ability for independent walks. The next theorem calculates 
this probability. 


Theorem 2 (Intersection Probability). Consider 
two independent random walks obeying the same ergodic tran¬ 
sition probability matrix, Q with invariant distribution n, as 
described in Definition^ Furthermore, assume that both of 
them are initially distributed uniformly over the state space 
of size n. The probability that they meet within t steps, is 
bounded as follows. 


Pn{t) <-h 

n 


t||7r||oo 

Pt 


where ||7r||oo, denotes the maximal element of the vector tv. 


The proof is based on the observation that the loo norm of 
a distribution controls the probability that two independent 
samples coincide. We show that for all steps of the random 
walk, that norm is controlled by the loo norm of tt. We defer 
the full proof to Appendix |B.2[ 

A number of studies, give experimental evidence (e.g. [^) 
suggesting that PageRank values for the web graph follow 
a power-law distribution with parameter approximately 0 = 
2.2. That is true for the tail of the distribution - the largest 
values, hence of interest to us here - regardless of the choice 
of pt- The following proposition bounds the value of the 
heaviest PageRank value, ||vr||oo. 

Proposition 7 (Max of Power-Law Distribution). 
Let TT £ follow a power-law distribution with param¬ 

eter 6 and minimum value pr/n. Its maximum element, 

|| 7 r||cx 3 , is at most n f with probability at least 1 — crD~ , 
for some universal constant c. 


Assuming 6 = 2.2 and picking, for example, 7 = 0.5, we get 
that 


PdKIloo > Ij^fn) < cn 

This implies that with probability at least 1 — cn~^^^ the 
meeting probability is bounded as follows. 


Pn(t) <-1- -j=- 

n PT-Jn 


One would usually take a number of steps t that are either 
constant or logarithmic with respect to the graph size n. 
This implies that for many reasonable choices of set size k 
and acceptable probability of failure S, the meeting proba¬ 
bility vanishes as n grows. Then we can make the second 
term of the error in Q arbitrarily small by controlling the 
number of frogs, N. The proof for Proposition!^ is deferred 
to Appendix |B.3| 


2.4 Prior Work 

There is a very large body of work on computing and 
approxim ating PageRan k on different computation models 
[e.g. see [10|| |13[ |14[ [ 4 ] and references therein). To the 

best of our knowledge, our work is the first to specifically de¬ 
sign an approximation algorithm for high-PageRank nodes 
for graph engines. Another line of work looks for Personal¬ 
ized PageRank (PPR) scores. This quantifies the influence 
an arbitrary node i has on another node j, cf. recent work 
[22| and discussion therein. In [^, the top-k approxima¬ 
tion of PPR is studied. However, PPR is not applicable in 
our case, as we are looking for an answer close to a global 
optimum. 

In [^, a random-walks-based algorithm is proposed. The 
authors provide some insightful analysis of different varia¬ 
tions of the algorithm. They show that starting a single 
walker from every node, is sufficient to achieve a good global 
approximation. We focus on capturing a few nodes with a lot 
of mass, hence we can get away with orderwise much fewer 
frogs than 0{n). This is important for achieving low net¬ 
work traffic when the algorithm is executed on a distributed 
graph framework. Figure shows linear reduction in net¬ 
work traffic when the number of initial walkers decreases. 
Furthermore, our method does not require waiting for the 
last frog to naturally expire (note that the geometric distri¬ 
bution has infinite support). We impose a very short time 
cut-off, t, and exactly analyze the penalty in captured mass 
we pay for it in Theorem 

One natural question is how our algorithm compares to, 
or can be complemented by, graph sparsification techniques. 
One issue here is that graph sparsification crucially depends 
on the similarity metric used. Well-studied properties that 
are preserved by different sparsification methods involve lengths 
of shortest paths between vertices (such sparsifiers are called 
Spanners, see e.g. [28| ), cuts between subsets of vertices 
and more generally quadratic forms of the graph laplacian 
[^[^, see Izl and references therein for a recent overview. 

To the best of our knowledge, there are no known graph 
sparsification techniques that preserve vertex PageRank. 

One natural heuristic that one may consider is to inde¬ 
pendently flip a coin and delete each edge of the graph with 
some probability r. No te that this is crucially different from 
spectral sparsifiers 33 ^ that choose these probabilities us¬ 
ing a process that is already more complicated than esti¬ 
mating PageRank. This simple heuristic of independently 
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deleting edges indeed accelerates the estimation process for 
high-PageRank vertices. We compare Frog Wild to this 
uniform sparsification process in Fignrej^ We present here 
results for 2 iterations of the GraphLab PR on the spar- 
sified graph. Note that rnnning only one iteration is not 
interesting since it actually estimates only the in-degree of 
a node which is known in advance (i.e., jnst after the graph 
loading) in a graph engine framework. It can be seen in 
Figure that even when only two iterations are used on 
the sparsified graph the running time is significantly worse 
compared to FrogWild and the accuracy is comparable. 

Our base-line comparisons come from the graph frame¬ 
work papers since PageRank is a standard benchmark for 
running-time, network and other computations. Our imple¬ 
mentation is on GraphLab (PowerGraph) and significantly 
outperforms the built-in PageRank algorithm. This algo¬ 
rithm is already shown in [17[ |31| to be signihcantly more 
efficient compared to other frameworks like Hadoop, Spark, 
Giraph etc. 

3. EXPERIMENTS 

In this section we compare the performance of our algo¬ 
rithm to the PageRank algorithm shipped with GraphLab 
v2.2 (PowerGraph) [^. The fact that GraphLab is the 
fastest distributed engine for PageRank is established ex¬ 
perimentally in [^. We focus on two algorithms: the basic 
built-in algorithm provided as part of the GraphLab graph 
analytics toolkit, referred to here as GraphLab PR, and 
FrogWild. Since we are looking for a top-fc approximation 
and GraphLab PR is meant to find the entire PageRank 
vector, we only run it for a small number of iterations (usu¬ 
ally 2 are sufficient). This gives us a good top-fc approxi¬ 
mation and is much faster than running the algorithm until 
convergence. We also fine tune the algorithm’s tolerance 
parameter to get a good but fast approximation. 

We compare several performance metrics, namely: run¬ 
ning time, network usage, and accuracy. The metrics do 
not include time and network usage required for loading the 
graph into GraphLab (known as the ingress time). They 
reflect only the execution stage. 

3.1 The Systems 

We perform experiments on two systems. The first system 
is a cluster of 20 virtual machines, created using VirtualBox 
4.3 on a single physical server. The server is based on an 
Intel® Xeon® GPU E5-1620 with 4 cores at 3.6 GHz, and 
16 GB of RAM. The second system, comprises of a cluster 
of up to 24 EC2 machines on AWS (Amazon web services) 
[3 . We use mS.xlarge instances, based on Intel® Xeon® 
C -“U E5-2670 with 4 vCPUs and 15 GB RAM. 

3.2 The Data 

For the VirtualBox system, we use the LiveJournal graph 
[21| with 4.8M vertices and 69M edges. For the AWS system, 
in addition to the LiveJournal graph, we use the Twitter 
graph which has 41.6M nodes and 1.4B edges. 

3.3 Implementation 

FrogWild is implemented on the standard GAS (gather, 
apply, scatter) model. We implement initO, apply O, and 
scatterO. The purpose of initO is to collect the random 
walks sent to the node by its neighbors using scatterO in 


the previous iteration. In the first iteration, initO gener¬ 
ates a random fraction of the initial total number of walkers. 
This implies that the initial walker locations are randomly 
distributed across nodes. FrogWild requires the length of 
random walks to be geometrically distributed (see Section 


2.21. For the sake of efficiency, we impose an upper bound on 


the length of random walks. The algorithm is executed for 
the constant number of iterations (experiments show good 
results with even 3 iterations) after which all the random 
walks are stopped simultaneously. The apply () function 
is responsible for keeping track of the number of walkers 
that have stopped on each vertex and scatter!) distributes 
the walkers still alive to the neighbors of the vertex. The 
scatterO phase is the most challenging part of the imple¬ 
mentation. In order to reduce information exchange between 
machines, we use a couple of ideas. 

First, we notice that random walks do not have iden¬ 
tity. Hence, random walks destined for the same neighbor 
can be combined into a single message. The second opti¬ 
mization and significant part of our work is modifying the 
GraphLab engine. The recent versions of GraphLab (since 
PowerGraph) partition the graph by splitting vertices. As a 
consequence, the engine will need to synchronize all the mir¬ 
rors of a vertex over the network a number of times during 
each GAS cycle. 

When running a few random walks, only a handful of 
neighbors end up receiving walkers. For this reason, syn¬ 
chronizing all mirrors can be very wasteful. We deal with 
that by implementing randomized synchronization. We ex¬ 
pose parameter ps € [0,1] to the user as a small extension to 
the GraphLab API. It describes the fraction of replicas that 
will be synchronized. Replicas not synchronized remain idle 
for the upcoming scatter phase. The above edits in the en¬ 
gine are only a matter of a few (about 10) lines of code. Note 
that the Ps parameter is completely optional, i.e., setting it 
to 1 will result in the original engine operation. Hence, other 
analytic workloads will not be affected. However, any ran¬ 
dom walk or “gossip” style algorithm (that sends a single 
messages to a random subset of its neighbors) can benefit 
by exploiting ps. Our modification of the GraphLab engine 
as well as the FrogWild vertex program can be found in 

m. 


3.4 Results 

FrogWild is significantly faster and uses less network 
and CPU compared to GraphLab PR. Let us start with 
the Twitter graph and the AWS system. In Figure a) we 
see that, while GraphLab PR takes about 7.5 seconds per 
iteration (for 12 nodes), FrogWild takes less than 1 sec, 
achieving more than a 7x speedup. Reducing the value of ps 
decreases the running time. We see a similar picture when 
we study the total running time of the algorithms in Figure 

Bb)). 

We plot network performance in Figure [Uc). We get a 
IOOO 2 ; improvement compared to the exact GraphLab PR, 
and more than 10a: with respect to doing 1 or 2 iterations 
of GraphLab PR. In Figure [^d) we can see that the total 
CPU usage reported by the engine is also much lower for 
FrogWild. 

We now turn to compare the approximation metrics for 
the PageRank algorithm. For various k, we check the two 
accuracy metrics: Mass captured (Figure [^a)) and the Ex¬ 
act identification (FigureBb)). Mass captured - is the total 
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Figure 1: PageRank performance for various number of nodes. Graph: Twitter; system: AWS (Amazon Web 
Services); FrogWild parameters: 800K initial random walks and 4 iterations, (a) — Running time per iteration, (b) 
— Total running time of the algorithms, (c) — Total network bytes sent by the algorithm during the execution (does 
not include ingress time), (d) — Total CPU usage time. Notice, this metric may be larger than the total running time 
since many CPUs run in parallel. 


PageRank that the reported top-fc vertices worth in the ex¬ 
act ranking. Exact identification - is the number of vertices 
in the intersection of the reported top-fc and the exact top-fc 
lists. We can see that the approximation achieved by the 
FrogWild for ps = 1 and ps = 0.7 always outperforms 
the GraphLab PR with 1 iteration. The approximation 
achieved by the FrogWild with ps = 0.4 is relatively good 
for the both metrics, and with ps =0.1 is reasonable for the 
Mass captured metrics. 

In Figure we can see the tradeoff between the accuracy, 
total running time, and the network usage. The performance 
of FrogWild is evaluated for various number of iterations 
and the values of Ps- The results show that with the accu¬ 
racy comparable to GraphLab PR, FrogWild has much 
less running time and network usage. Figure illustrates 
how much network traffic we save using FrogWild. The 
area of each circle is proportional to the number of bytes 
sent by each algorithm. 


We also compare FrogWild to an approximation strat¬ 
egy that uses a simple sparsification technique described in 
Section |2.4| First, the graph is sparsified by deleting each 
edge with probability r, then GraphLab PR is executed. 
In Figure we can see that FrogWild outperforms this 
approach in terms running time while achieving comparable 
accuracy. 

Finally, we plot results for the LiveJournal graph on the 
VirtualBox system. Figures |^a,b) show the effect of the 
number of walkers, N, and the number of iterations for 
FrogWild on the achieved accuracy. Good accuracy and 
running time (see Figure [^c,d)) are achieved for SOOR' ini¬ 
tial random walks and 4 iterations of FrogWild. Similar 
to the Twitter graph, also for the LiveJournal graph we can 
see, in Figure that our algorithm is faster and uses much 
less network, while still maintaining good PageRank accu¬ 
racy. By varying the number of initial random walks and 
the number of iterations we can fine-tune the FrogWild 
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Twitter, AWS, 16 nodes, BOOK rw, 4 iters 
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(a) (b) 

Figure 2: PageRank approximation accuracy for various number of top-k PageRank vertices. Graph: Twitter; system: 
AWS (Amazon Web Services) with 16 nodes; FrogWild parameters: 800K initial random walks and 4 iterations, (a) 
— Mass captured. The total PageRank that the reported top-A: vertices worth in the exact ranking, (b) — Exact 
identification. The number of vertices in the intersection of the reported top-A: and the exact top-fc lists. 


Twitter, AWS, 24 nodes, 800K rw Twitter, AWS, 24 nodes, 800K rw 



Total time (s) Total network (bytes) 

(a) (b) 

Figure 3: PageRank approximation accuracy with the “Mass captured” metric for top-100 vertices. Graph: Twitter; 
system: AWS (Amazon Web Services) with 24 nodes; FrogWild parameters: 800K initial random walks, (a) - Accuracy 
versus total running time, (b) - Accuracy versus total network bytes sent. 


for the optimal accuracy-speed tradeoff. 

Interestingly, for both graphs (Twitter and LiveJournal), 
reasonable parameters are: BOOK initial random walks and 
4 iterations, despite the order of magnitude difference in 
the graph sizes. This implies slow growth for the necessary 
number of frogs with respect to the size of the graph. This 
scaling behavior is tough to check in practice, but it is ex¬ 
plained by our analysis. Specifically, Remark shows that 
the number of frogs should scale as N = O ^ ^. 
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Figure 6: Graph: LiveJournal; system: VirtualBox with 20 nodes, (a) — Accuracy for various number of initial 
random walks in the FrogWild (with 4 iterations), (b) — Accuracy for various number of iterations of FrogWild (with 
800K initial random walks), (c) — Total running time for various number of initial random walks in the FrogWild (with 
4 iterations), (d) — Total running time for various number of iterations of FrogWild (with 800K initial random walks). 
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Figure 7: Graph: LiveJournal; system: VirtualBox with 20 nodes; FrogWild parameters: 800K initial random walks, 
(a) — Accuracy versus total running time, (b) — Accuracy versus total network bytes sent. 
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Figure 4: Accuracy versus total running time. Graph: 
Twitter; system: AWS (Amazon Web Services) with 24 
nodes; FrogWild parameters: 800K initial random walks. 
The area of each circle is proportional to the total net¬ 
work bytes sent by the specific algorithm. 



Figure 5: Accuracy versus total running time. Graph: 
Twitter; system: AWS (Amazon Web Services) with 12 
nodes; FrogWild parameters: 800K initial random walks, 
(j = 1 — r is the probability of keeping an edge in the 
sparsification process. 


E. Smirnova, and M. Sokol. Monte carlo methods for 
top-k personalized pagerank lists and name 
disambiguation. CoRR, abs/1008.3775, 2010. 

[7] J. Batson, D. A. Spielman, N. Srivastava, and S.-H. 
Teng. Spectral sparsification of graphs: Theory and 
algorithms. Commun. ACM, 56(8):87-94, Aug. 2013. 

[8] L. Becchetti and C. Castillo. The distribution of 
pagerank follows a power-law only for particular 
values of the damping factor. In Proceedings of the 
15th international conference on World Wide Weh, 
pages 941-942. ACM, 2006. 

[9] A. A. Benczur and D. R. Karger. Approximating s-t 
minimum cuts in 0(n2) time. In Proceedings of the 


LiveJournal, VBox, 20 nodes, 4 iters 



Number of initial random walks 

Figure 8: Network usage of FrogWild versus the num¬ 
ber of initial random walks. Graph: LiveJonrnal; sys¬ 
tem: VirtualBox with 20 nodes; FrogWild parameters: 
4 iterations. 


Twenty-eighth Annual ACM Symposium on Theory of 
Computing, STOC ’96, pages 47-55, New York, NY, 
USA, 1996. ACM. 

[10] P. Berkhin. A survey on pagerank computing. Internet 
Mathematics, 2(1):73-120, 2005. 

[11] M. Borokhovich and I. Mitliagkas. FrogWild! code 
repository. 

https://github.com/michaelbor/frogwild, 2014. 
Accessed: 2014-10-30. 

[12] P. Bremaud. Markov chains: Cibbs fields, Monte Carlo 
simulation, and queues, volume 31. springer, 1999. 

[13] A. Z. Broder, R. Lempel, F. Maghoul, and 

J. Pedersen. Efficient pagerank approximation via 
graph aggregation. Information Retrieval, 

9(2):123-138, 2006. 

[14] A. Das Sarma, D. Nanongkai, G. Pandurangan, and 
P. Tetali. Distributed random walks. Journal of the 
ACM (JACM), 60(1):2, 2013. 

[15] L. Elden. A note on the eigenvalues of the google 
matrix. arXiv preprint math/0401177, 2004. 

[16] S. Fortunate and A. Flammini. Random walks on 
directed networks: the case of pagerank. International 
Journal of Bifurcation and Chaos, 17(07):2343-2353, 
2007. 

[17] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and 

C. Guestrin. Powergraph: Distributed graph-parallel 
computation on natural graphs. In OSDI, volume 12, 
page 2, 2012. 

[18] T. Haveliwala and S. Kamvar. The second eigenvalue 
of the google matrix. Stanford University Technical 
Report, 2003. 

[19] J. Heidemann, M. Klier, and F. Probst. Identifying 
key users in online social networks: A pagerank based 
approach. In ICIS’lO, 2010. 

[20] H. Kwak, G. Lee, H. Park, and S. Moon. What is 
Twitter, a social network or a news media? In WWW 
’10: Proceedings of the 19th international conference 
on World wide web, pages 591-600, New York, NY, 


10 




















USA, 2010. ACM. 

[21] J. Leskovec and A. Krevl. SNAP Datasets: Stanford 
large network dataset collection. 
http://snap.stanford.edu/data, June 2014. 

[22] P. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. 
Fast-ppr: Scaling personalized pagerank estimation for 
large graphs. arXiv preprint arXiv:1404-3181, 2014. 

[23] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, 

C. Guestrin, and J. M. Hellerstein. Graphlab: A new 
parallel framework for machine learning. In 
Conference on Uncertainty in Artificial Intelligence 
(UAI), July 2010. 

[24] G. Malewicz, M. H. Austern, A. J. Bik, J. G. Dehnert, 

I. Horn, N. Leiser, and G. Czajkowski. Pregel: a 
system for large-scale graph processing. In Proceedings 
of the 2010 ACM SIGMOD International Conference 
on Management of data, pages 135-146. ACM, 2010. 

[25] R. Mihalcea and P. Tarau. Textrank: Bringing order 
into texts. Association for Computational Linguistics, 
2004. 

[26] M. E. Newman. Power laws, pareto distributions and 
zipf’s law. Contemporary physics, 46(5):323-351, 2005. 

[27] L. Page, S. Brin, R. Motwani, and T. Winograd. The 
pagerank citation ranking: Bringing order to the web. 
1999. 

[28] D. Peleg and J. D. Ullman. An optimal synchronizer 
for the hypercube. In Proceedings of the Sixth Annual 
ACM Symposium on Principles of Distributed 
Computing, PODC ’87, pages 77-85, New York, NY, 
USA, 1987. ACM. 

[29] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A 
lock-free approach to parallelizing stochastic gradient 
descent. In Advances in Neural Information Processing 
Systems, pages 693-701, 2011. 

[30] A. D. Sarma, S. Gollapudi, and R. Panigrahy. 
Estimating pagerank on graph streams. Journal of the 
ACM (JACM), 58(3):13, 2011. 

[31] N. Satish, N. Sundaram, M. A. Patwary, J. Seo, 

J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and 
P. Dubey. Navigating the maze of graph analytics 
frameworks using massive graph datasets. 

[32] S. Serra-Capizzano. Jordan canonical form of the 
google matrix: a potential contribution to the 
pagerank computation. SIAM Journal on Matrix 
Analysis and Applications, 27(2):305-312, 2005. 

[33] D. Spielman and N. Srivastava. Graph sparsification 
by effective resistances. SIAM Journal on Computing, 
40(6):1913-1926, 2011. 

[34] VirtualBox 4.3. www.virtualbox.org, 2014. 

[35] R. S. Xin, J. E. Gonzalez, M. J. Franklin, and 
1. Stoica. Graphx: A resilient distributed graph 
system on spark. In First International Workshop on 
Graph Data Management Experiences and Systems, 
page 2. ACM, 2013. 


edges at time t. The event Ejj represents the erasure of 
edge {i,j) from the graph for time t. The edge is not per¬ 
manently removed from the graph, it is just disabled and 
considered again in the next step. The edge erasure models 
we study satisfy the following properties. 

1. Edges are erased independently for different vertices, 

¥{Etj,Elk) = ¥{El,)¥{Elk) 

and across time, 

¥iElj,Ef,)=¥{Elj)¥{Efj). 

2. Each outgoing edge is preserved (not erased) with prob¬ 
ability at least ps. 

P{Ejj) > Ps 

3. Erasures do not exhibit significant negative correlation. 
Specifically, 

!pra^)>p- 

4 . Erasures in a neighbourhood are symmetric. Any sub¬ 
set of out-going edges of vertex i, will be erased with 
exactly the same probability as another subset of the 
same cardinality. 

The main two edge erasure models we consider are de¬ 
scribed here. They both satisfy all required properties. Our 
theory holds for botlQ but in our implementation and ex¬ 
periments we use ”At Least One Out-Edge Per Node.” 

Example 9 (Independent Erasures). Every edge is 
preserved independently with probability ps. 

Example 10 (At Least One Out-Edge Per Node). 
This edge erasure model, decides all erasures for node i in¬ 
dependently, like Independent Erasures, but if all out-going 
edges for node i are erased, it draws and enables one of them 
uniformly at random. 

B. THEOREM PROOFS 
B.l Proof of Theorem [1] 

In this section we provide a complete proof of our main 
results. We start from simple processes and slowly introduce 
the analytical intricacies of our system one-by-one giving 
guarantees on the performance of each stage. 

Process 11 (Fixed Step). Independent walkers start 
on nodes selected uniformly at random and perform random 
walks on the augmented graph. This means that teleporta¬ 
tion happens with probability pr and the walk is described 
by the transition probability matrix (TPM) Q, as defined 
in Section \2.1\ Each walker performs exactly t steps before 
yielding a sample. The number of walkers tends to infinity. 

Before we talk about the convergence properties of this 
Markov chain, we need some definitions. 


APPENDIX 

A. EDGE ERASURE MODEL 

Definition 8 (Edge Erasure Model). An edge era¬ 
sure model is a process that is independent from the random 
walks (up to time t) and temporarily erases a subset of all 


Definition 12. The -contrast x^{a\fi) of a with re¬ 
spect to P is defined by 


i 


m 


^ Independent Erasures can lose some walkers, when it tem¬ 
porarily leads to some nodes having zero out-degree. 


11 




Lemma 13. Let n G A" ^ a distribution satisfying 


min7r(i) > — 
i n 


for constant c < 1, and let u € A” ^ denote the uniform 
distribution. Then, x^(w;7r) < ('^). 

Proof. 


X^(M;7r) 


(1/n - Tiii))^ = r ^ 

'K{i) n) 

1 1 — ?27r(i) ^ 1 1 — c 1 — c 

n ^ nTiii) ~ n ^ c c 

i ' i 


Here we used the assumed lower bound on 7r(i) and the fact 
that (1 — x)/x is decreasing in x. □ 


Lemma 14. Let tt* denote the distribution of the walkers 
after t steps. Its -divergence with respect to the PageRank 
vector, TT, is 


X^(7r‘;7r) < 


1 — Pt 
Pt 


(1 -PtY- 


Proof. Since Q is ergodic (but non-reversible) we can 
use the contrast bound in [12|, which gives us 


X^(7r‘;7r) < XI{QQ)x^{'Ko-,tv), 


where Q — DQ'D~^, for D = diag{'K), is called the mul¬ 
tiplicative reversibilization of Q. We want an upper bound 
on the second largest eigenvalue of QQ = DQ'D~^Q. From 
the Perron-Frobenius theorem, we know that Ai (Q) = 1 and 
from ^ |A 2 (Q)| < 1 — Pt. Matrix Q is similar to 

Q = DQ'D~ , so they have the same spectrum. From this 
we get the bound 


\X2{QQ)\ < 1 — Pt. 


The starting distribution tto is assumed to be uniform and 
every element of the PageRank vector is lower bounded by 
pr/n. From Lemma [l3l we get 

X^(7ro;7r) < , 

\ Pt J 

and putting everything together we get the statement. □ 


Process 15 (Truncated Geometric). Independent 
walkers start on nodes selected uniformly at random and per¬ 
form random walks on the original graph. This means that 
there is no teleportation and the walk is described by the 
TPM P as defined in Section \2.1\ Each walker performs a 
random number of steps before yielding a sample. Specifi¬ 
cally, the number of steps follows a geometric distribution 
with parameter Pt. Any walkers still active after t steps are 
stopped and their positions are acquired as samples. This 
means that the number of steps is given by the minimum of 
t and a geometric random variable with parameter pt . The 
number of walkers tends to infinity. 

Lemma 16. The samples acquired from Process m and 
Process lT5\ follow the exact same distribution. 

Proof. Let nt denote the distribution of the walk after t 
steps according to Q (Process |ll[) and tt) denote the distri¬ 
bution of the samples provided by the truncated geometric 


process (Process|15[). Note that the both have the same uni¬ 
form starting distribution tvq = tv'q = u = Inxiln. For the 
latter process, the sampling distribution is 

t 

Tx't = ^pt(1 - PtYP'^u -I- (1 - ptY'^'^P^u. (5) 

T —0 

The last term corresponds to the cut-off we impose at time 
t. Now consider the definition of the TPM Q (Definition [^. 
The Markov chain described by Q, teleports at each steps 
with probability pt\ otherwise, it just proceeds according 
to the TPM P. With every teleportation, the walker starts 
from the uniform distribution, u - any progress made so far 
is completely ’’forgotten.” Therefore, we just need to focus 
on the epoch between the last teleportation and the cut-off 
time t. The times between teleportation events are geomet¬ 
rically distributed with parameter pt. The teleportation 
process is memory-less and reversible. Starting from time t 
and looking backwards in time, the last teleportation event 
is a geometric number of steps away, and with probability 
(1 — PtY^^ it happens before the starting time 0. In that 
case we know that no teleportation happens in [0,t]. The 
samples acquired from this process are given by 

t 

nt = ^ pt (1 - PtYP'^ u -|- (1 - ptY^^P^^, ( 6 ) 

T —0 

which is exactly the distribution for Process given in 

□ 

Lemma 17 (Mixing Loss). Let Y g A"“^ denote the 
distribution of the samples acquired through Process [75| The 
mass it captures (Definition^ is lower-bounded as follows. 

^ ^ ^ ^ /(I -Pt)‘+i 

Pk(n ) > pLk{n) - \ - -- 

y Pt 

Proof. Let us define Si = nj — nt. First we show that 

Pk[Y) > Pk{n) — \\n — Y\\i. (7) 

To see this, first consider the case when 5i = —<52 and = 0 
for 1 = 3, ...,n. The maximum amount of mass that can be 
missed by Y, in this case, is |5i| -I- |(52|. This happens when 
TTi and 712 are exactly |5i| -|- |52| apart and are flipped in 
the ordering by Y. This argument generalizes to give us 
Q. Now assume that the x^-divergence of Y with respect 
to the PageRank vector n is bounded by e^. Now using a 
variational argument and the KKT conditions we can show 
that setting Si = nit for all i gives the maximum possible h 
error: 

IItt — 7r‘||i < e = y/x^iY',n). (8) 

For another proof using the Cauchy-Schwarz inequality, see 
[I^ . Finally, combining ^ with 0 and the results from 
Lemmaandgives us the statement. □ 

Lemma 18 (Sampling Loss). LethN be the estimator 
of Definition^ using N samples from the FrogWild.^ sys¬ 
tem. This is essentially, Process \T^ with the added complica¬ 
tion o/random synchronization as explained in Section \2.i^ 
Also, let Y denote the sample distribution after t steps, as 
defined in Lemmo [Tg[ The mass captured by this process is 
lower bounded as follows, with probability at least 1 — <5. 


Pk{nN) > Pk{Y) - 


^ + (1 -Ps)Pn(t) 
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Proof. In this proof, let xj denote the individual (marginal) 
walk distribution for walker I at time t. We know, that it fol¬ 
lows the dynamics = Pxj, for all I £ [A^], i.e. xj = x\. 
First we show that ||7riv — x \\\2 is small. 


Ilfijv - a;i||2 > e) < 


E[||7iiv - x\\\l] 


(9) 


Here we used Markov’s inequality. We use s\ to denote the 
position of walker I at time t as a vector. For example, 
S; = Ci, if walker I is at state i at time t. Now let us break 
down the norm on the numerator of 

1 2 

II t II2 \ ^ / t t \ 

IKiv - a:i||2 = Jr - ^i) 

i 2 

1 \ ^ II t t||2 I 1 \ ^ / t t t\ 

= II®* " ®lll2 + Jp - ®l) (Sfc - ®l) 

I l^k 

( 10 ) 


For the diagonal terms we have: 

E[||S! - x\\\l] = ^ E [||sf - x\\\l\s\ = i] P(s? = i) 

iG[n] 

= 11®^ “ x{\\lx{{i) = 1 - 11**1 lli < 1 (11) 

iG[n] 


Under the edge erasures model, the trajectories of differ¬ 
ent walkers are not generally independent. For example, if 
they happen to meet, they are likely to make the same deci¬ 
sion for their next step, since they are faced with the same 
edge erasures. Now we prove that even when they meet, we 
can consider them to be independent with some probability 
that depends on ps . 

Consider the position processes for two walkers, {si}t and 
{s 2 }t. At each step t and node i a number of out-going 
edges are erased. Any walkers on i, will choose uniformly 
at random from the remaining edges. Now consider this 
alternative process. 


Process 19 (Blocking Walk). A blocking walk on the 
graph under the erasure model, follows these steps. 

1. Walker I finds herself on node i at time t. 

2. Walker I draws her next state uniformly from the full 
set of out-going edges. 

w ~ Uniform(A/’o(i)) 

3. If the edge (i, w) is erased at time t, the walker cannot 
traverse it. We call this event a block and denote it by 
B\. In the event of a block: 

• Walker redraws her next step from the out-going 
edges of i not erased at time t. 

• Otherwise, w is used as the next state. 


A blocking walk is exactly equivalent to our original pro¬ 
cess; walkers end up picking a destination uniformly at ran¬ 
dom among the edges not erased. From now on we focus 
on this description of our original process. We use the same 
notation: {s*}t for the position process and {*;}t for the 
distribution at time t. 

Let us focus on just two walkers, {si}t and {s^jt and 
consider a third process: two independent random walks on 
the same graph. We assume that these walks operate on the 
full graph, i.e. no edges are erased. We denote their positions 


by {ui}t and {wilt and their marginal distributions by {zi}t 
and {z 2 }t. 

Definition 20 (Time of First Interference). For two 
blocking walks, ti denotes the earliest time at which they 
meet and at least one of them experiences blocking. 

Ti = min{t : {s* = 4} C {B\ U 
We call this quantity the time of first interference. 

Lemma 21 (Process equivalence). For two walkers, 
the blocking walk and the independent walk are identical until 
the time of first interference. That is, assuming the same 
starting distributions, Xi = Zi and X 2 = Z 2 , then 

*1 = Zi and *2 = 4 'it <Ti. 

Proof. The two processes are equivalent for as long as 
the blocking walkers make independent decisions effectively 
picking uniformly from the full set of edges (before erasures). 
From the independence in erasures across time and vertices 
in Definition]^ as long as the two walkers do not meet, they 
are making an independent choices. Furthermore, since era¬ 
sures are symmetric, the walkers will be effectively choosing 
uniformly over the full set of out-going edges. 

Now consider any time t that the blocking walkers meet. 

As long as neither of them blocks, they are by definition 
taking independent steps uniformly over the set of all outgo¬ 
ing edges, maintaining equivalence to the independent walks 
process. This concludes the proof. □ 


Lemma 22. Let all walkers start from the uniform dis¬ 
tribution. The probability that the time of first interference 
comes before time t is upper bounded as follows. P(t/ < t) < 
il-Ps)pn{t) 


Proof. Let Mt be the event of a meeting at time t, 
Mt = {si = si}- In the proof of Theorem]^ we estab¬ 
lish that P(Mt) < p*/n,where p is the maximum row sum of 
the transition matrix P. Now denote the event of an inter¬ 
ference at time t as follows. It — Mt n {B\ U B 2 ), where B\ 
denotes the event of blocking, as described in Definition |19| 
Now, 

P(7t) = P(Mt n {B{ U H|)) = P(H} U H||Mt)pn(t). 


For the probability of a block given that the walkers meet 
at time t, 

P(B| UH||Mt) = 1 -P(H[n^|Mt) 

= 1 - VmiBl, Mt)P(H[ \Mt) <1 -pI 

To get the last inequality we used, from Definition the 
lower bound on the probability that an edge is not erased, 
and the lack of negative correlations in the erasures. 
Combining the above results, we get 


P(t/ < t) = P I ^ ]I{7^ } > 1 ] < E 






< ^(1 - pJ)P(M4 = E 

T=1 T=1 


T = 1 

2 t 


= E^(^-) 

T = 1 

-pl)Pn{t) 


which proves the statement. □ 
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Now we can bound the off-diagonal terms in (111. 

/ t t \f / t t \ 

[Si - Xi) (Sfc - Xi) 


=E 


(si - x\y{si - x\)\ti < t 


¥{ti < t) 


-l-E 


(s* - x\)'{si -x\)\ti >t 


P(r7 > t) 


In the second term, the case when I, k have not interfered, by 
Lemma pTj the trajectories are independent and the cross¬ 
covariance is 0. In the first term, the cross-covariance is 
maximized when s\ = sj,. That is. 


E 


(4 - x\y{si - x\)\ti < t 


< E[||sJ - xiwl] < 1 


From this we get 


¥ {s\ - x\y {si - x\) <{l-pl)pn{t), 
and in combination with we get from ( |10[ ) that 
E 


( 12 ) 


I' t||2 

llTTiV ~ II 2 


< J_ , {N - 1)(1 -p^)Pn(f) 
-N N 


Finally, we can plug this into ([^, and since all marginals x\ 
are the same, and denoted by tt*, we get 


P(II*JV — > e) < 


1 + (1 


pl)Pn{t){N - 1) 
Ne^ 


(13) 


Let TT* I g denote the restriction of the vector tt* to the set 
S. That is, 7r*|g(i) = rr‘(i) if i G S' and 0 otherwise. Now 
we show that for any set S of cardinality fc, 


|7r‘(S) — 7rjv(S)| < ||(7r‘ — 7riv)|g||i < %/fc||(7r‘ — •7riv)|g||2 

< \/fc||7r‘ — TTjvllz (14) 

Here we used the fact that for fc-length vector x, || 2 ;||i < 
\/fc||a :||2 and ||2;|_g|| < ||2:||. We define the top-fe sets 

S* = argmax5^[„] |s|^j,7rjv(S) 


S* = argmaxs^[„]_|s|^j,7r‘(S). 

Per these definitions, 

7 tN{S*) = max 7rjv(S) 

SC[n].|S| = fc 

> 7rjv(S*) > 7r*(S*) — — ■n-]v||2. (15) 


The last inequality is a consequence of ( |14[ ). Now using the 
inequality in (131 and denoting the LHS probability as 5, we 
get the statement of Lemma 


Combing the results of Lemma |17| and Lemma |18| we 
establish the main result. Theorem [H 

B.2 Proof of Theorem |2] 

Proof. Let u G denote the uniform distribution 

over [n], i.e. Ui = 1/n. The two walks start from the same 
initial uniform distribution, u, and independently follow the 
same law, Q. Hence, at time t they have the same marginal 


distribution, p* = Q*w. From the definition of the aug¬ 
mented transition probability matrix, Q, in Definition we 
get that 

TTi > —, Vi G [n]. 
n 

Equivalently, there exists a distribution q G such that 

TT = ptu -I- (1 — PT)q- 

Now using this, along with the fact that tt is the invariant 
distribution associated with Q (i.e. tt = Q^ix for all t > 0) 
we get that for any t > 0, 

Halloo - \\Q ^l|oo 

= WQ^Ptu + Q^{1- PT)q\\ao 

> pt||Q*u||oo. 

For the last inequality, we used the fact that Q and q contain 
non-negative entries. Now we have a useful upper bound for 
the maximal element of the walks’ distribution at time t. 

IIp^IIoo ^ (16) 

PT 

Let Mt be the indicator random variable for the event of a 
meeting at time t. 

^{walkers meet at time t} 

Then, P(Mt = 1) = Yi7^iPiPi = IIP*lli Since p° is the 
uniform distribution, i.e. Pi = ^ for all i, then ||p °||2 = 

We can also bound the I 2 norm of the distribution at other 
times. First, we upper bound the I 2 norm by the l^a norm. 

iipIiI = J^Pi - 


Here we used the fact that Pi > 0 and "^^Pi = 1. 
Now, combining the above results, we get 


Pn(i)=P >1 <E 




r=0 


= ^P(M^ = 1) = ^ llp^lla < 

T=0 T=0 T=0 

^ (1 _j_ ^lbll°o 

~ n pt 


For the last inequality, we used (161 for t > 1 and 
1/n. This proves the theorem statement. □ 


lb 


0||2 _ 
II 2 — 


B.3 Proof of Proposition 


Proof. The expected maximum value of n independent 
draws from a power-law distribution with parameter 6, is 
shown in [26| to be 

¥Xmax = 

Simple application of Markov’s inequality, gives us the state¬ 
ment. □ 
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